Model Selection

Multimodal Robot Control

# Multimodal Robot Control

STEVE R1 7B SFT GGUF

Static quantized version of STEVE-R1-7B-SFT, supporting multiple quantization levels for different hardware requirements

Text-to-Image English

Minivla Vq Bridge Prismatic

MiniVLA is a more compact yet higher-performing vision-language-action model, compatible with the Prismatic VLMs project codebase.

Transformers English

RDT-170M is a 170-million-parameter imitation learning diffusion Transformer model designed for robot vision-language-action tasks.

Multimodal Fusion

Transformers English

robotics-diffusion-transformer

A 1-billion-parameter imitation learning diffusion Transformer model pretrained on 1M+ multi-robot operation data, supporting multi-view visual-language-action prediction

Multimodal Fusion

Transformers English

robotics-diffusion-transformer

Octo Small is a robot control model trained based on diffusion policy, capable of predicting 7-dimensional actions for the next 4 steps, suitable for multi-source robot datasets.

Multimodal Fusion

Octo is a robot control foundation model trained on diffusion policy, capable of predicting future actions and processing multimodal inputs.

Multimodal Fusion

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase